Overview

Dataset statistics

Number of variables23
Number of observations1000
Missing cells317
Missing cells (%)1.4%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory179.8 KiB
Average record size in memory184.1 B

Variable types

Categorical9
Numeric11
DateTime1
Boolean2

Warnings

deceased_indicator has constant value "False" Constant
country has constant value "Australia" Constant
first_name has a high cardinality: 940 distinct values High cardinality
last_name has a high cardinality: 961 distinct values High cardinality
job_title has a high cardinality: 184 distinct values High cardinality
address has a high cardinality: 1000 distinct values High cardinality
Unnamed: 16 is highly correlated with Unnamed: 17 and 2 other fieldsHigh correlation
Unnamed: 17 is highly correlated with Unnamed: 16 and 2 other fieldsHigh correlation
Unnamed: 18 is highly correlated with Unnamed: 16 and 2 other fieldsHigh correlation
Unnamed: 19 is highly correlated with Unnamed: 16 and 2 other fieldsHigh correlation
Unnamed: 20 is highly correlated with Rank and 1 other fieldsHigh correlation
Rank is highly correlated with Unnamed: 20 and 1 other fieldsHigh correlation
Value is highly correlated with Unnamed: 20 and 1 other fieldsHigh correlation
postcode is highly correlated with property_valuationHigh correlation
property_valuation is highly correlated with postcodeHigh correlation
Unnamed: 16 is highly correlated with Unnamed: 17 and 2 other fieldsHigh correlation
Unnamed: 17 is highly correlated with Unnamed: 16 and 2 other fieldsHigh correlation
Unnamed: 18 is highly correlated with Unnamed: 16 and 2 other fieldsHigh correlation
Unnamed: 19 is highly correlated with Unnamed: 16 and 2 other fieldsHigh correlation
Unnamed: 20 is highly correlated with Rank and 1 other fieldsHigh correlation
Rank is highly correlated with Unnamed: 20 and 1 other fieldsHigh correlation
Value is highly correlated with Unnamed: 20 and 1 other fieldsHigh correlation
Unnamed: 16 is highly correlated with Unnamed: 17 and 2 other fieldsHigh correlation
Unnamed: 17 is highly correlated with Unnamed: 16 and 2 other fieldsHigh correlation
Unnamed: 18 is highly correlated with Unnamed: 16 and 2 other fieldsHigh correlation
Unnamed: 19 is highly correlated with Unnamed: 16 and 2 other fieldsHigh correlation
Unnamed: 20 is highly correlated with Rank and 1 other fieldsHigh correlation
Rank is highly correlated with Unnamed: 20 and 1 other fieldsHigh correlation
Value is highly correlated with Unnamed: 20 and 1 other fieldsHigh correlation
Rank is highly correlated with Value and 1 other fieldsHigh correlation
Unnamed: 17 is highly correlated with Unnamed: 19 and 3 other fieldsHigh correlation
job_industry_category is highly correlated with genderHigh correlation
Unnamed: 19 is highly correlated with Unnamed: 17 and 2 other fieldsHigh correlation
gender is highly correlated with job_industry_categoryHigh correlation
state is highly correlated with postcodeHigh correlation
Unnamed: 16 is highly correlated with Unnamed: 17 and 2 other fieldsHigh correlation
Unnamed: 18 is highly correlated with Unnamed: 17 and 2 other fieldsHigh correlation
Value is highly correlated with Rank and 1 other fieldsHigh correlation
Unnamed: 20 is highly correlated with Rank and 1 other fieldsHigh correlation
property_valuation is highly correlated with postcodeHigh correlation
postcode is highly correlated with state and 1 other fieldsHigh correlation
owns_car is highly correlated with Unnamed: 17High correlation
state is highly correlated with country and 1 other fieldsHigh correlation
country is highly correlated with state and 5 other fieldsHigh correlation
deceased_indicator is highly correlated with state and 5 other fieldsHigh correlation
wealth_segment is highly correlated with country and 1 other fieldsHigh correlation
job_industry_category is highly correlated with country and 1 other fieldsHigh correlation
gender is highly correlated with country and 1 other fieldsHigh correlation
owns_car is highly correlated with country and 1 other fieldsHigh correlation
last_name has 29 (2.9%) missing values Missing
DOB has 17 (1.7%) missing values Missing
job_title has 106 (10.6%) missing values Missing
job_industry_category has 165 (16.5%) missing values Missing
first_name is uniformly distributed Uniform
last_name is uniformly distributed Uniform
address is uniformly distributed Uniform
address has unique values Unique

Reproduction

Analysis started2021-07-28 19:48:48.320285
Analysis finished2021-07-28 19:49:14.444215
Duration26.12 seconds
Software versionpandas-profiling v3.0.0
Download configurationconfig.json

Variables

first_name
Categorical

HIGH CARDINALITY
UNIFORM

Distinct940
Distinct (%)94.0%
Missing0
Missing (%)0.0%
Memory size7.9 KiB
Dorian
 
3
Rozamond
 
3
Mandie
 
3
Tyne
 
2
Harman
 
2
Other values (935)
987 

Length

Max length13
Median length6
Mean length6.087
Min length2

Characters and Unicode

Total characters6087
Distinct characters52
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique883 ?
Unique (%)88.3%

Sample

1st rowChickie
2nd rowMorly
3rd rowArdelis
4th rowLucine
5th rowMelinda

Common Values

ValueCountFrequency (%)
Dorian3
 
0.3%
Rozamond3
 
0.3%
Mandie3
 
0.3%
Tyne2
 
0.2%
Harman2
 
0.2%
Bartram2
 
0.2%
Muffin2
 
0.2%
Jesse2
 
0.2%
Liane2
 
0.2%
Nicol2
 
0.2%
Other values (930)977
97.7%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
mandie3
 
0.3%
rozamond3
 
0.3%
dorian3
 
0.3%
wheeler2
 
0.2%
geoff2
 
0.2%
bartram2
 
0.2%
art2
 
0.2%
beverlee2
 
0.2%
suzy2
 
0.2%
laurie2
 
0.2%
Other values (930)977
97.7%

Most occurring characters

ValueCountFrequency (%)
e702
 
11.5%
a631
 
10.4%
i566
 
9.3%
n488
 
8.0%
r434
 
7.1%
l414
 
6.8%
o303
 
5.0%
t223
 
3.7%
d193
 
3.2%
y168
 
2.8%
Other values (42)1965
32.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter5086
83.6%
Uppercase Letter1000
 
16.4%
Dash Punctuation1
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e702
13.8%
a631
12.4%
i566
11.1%
n488
9.6%
r434
8.5%
l414
8.1%
o303
 
6.0%
t223
 
4.4%
d193
 
3.8%
y168
 
3.3%
Other values (16)964
19.0%
Uppercase Letter
ValueCountFrequency (%)
A88
 
8.8%
C80
 
8.0%
M75
 
7.5%
D71
 
7.1%
S64
 
6.4%
R63
 
6.3%
L61
 
6.1%
B56
 
5.6%
K54
 
5.4%
G54
 
5.4%
Other values (15)334
33.4%
Dash Punctuation
ValueCountFrequency (%)
-1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin6086
> 99.9%
Common1
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
e702
 
11.5%
a631
 
10.4%
i566
 
9.3%
n488
 
8.0%
r434
 
7.1%
l414
 
6.8%
o303
 
5.0%
t223
 
3.7%
d193
 
3.2%
y168
 
2.8%
Other values (41)1964
32.3%
Common
ValueCountFrequency (%)
-1
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII6087
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e702
 
11.5%
a631
 
10.4%
i566
 
9.3%
n488
 
8.0%
r434
 
7.1%
l414
 
6.8%
o303
 
5.0%
t223
 
3.7%
d193
 
3.2%
y168
 
2.8%
Other values (42)1965
32.3%

last_name
Categorical

HIGH CARDINALITY
MISSING
UNIFORM

Distinct961
Distinct (%)99.0%
Missing29
Missing (%)2.9%
Memory size7.9 KiB
Eade
 
2
Burgoine
 
2
Borsi
 
2
Crellim
 
2
Van den Velde
 
2
Other values (956)
961 

Length

Max length21
Median length7
Mean length7.026776519
Min length3

Characters and Unicode

Total characters6823
Distinct characters54
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique951 ?
Unique (%)97.9%

Sample

1st rowBrister
2nd rowGenery
3rd rowForrester
4th rowStutt
5th rowHadlee

Common Values

ValueCountFrequency (%)
Eade2
 
0.2%
Burgoine2
 
0.2%
Borsi2
 
0.2%
Crellim2
 
0.2%
Van den Velde2
 
0.2%
Sissel2
 
0.2%
Sturch2
 
0.2%
Minshall2
 
0.2%
Shoesmith2
 
0.2%
Hallt2
 
0.2%
Other values (951)951
95.1%
(Missing)29
 
2.9%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
van3
 
0.3%
de3
 
0.3%
den3
 
0.3%
velde2
 
0.2%
shoesmith2
 
0.2%
crellim2
 
0.2%
borsi2
 
0.2%
burgoine2
 
0.2%
eade2
 
0.2%
hallt2
 
0.2%
Other values (960)963
97.7%

Most occurring characters

ValueCountFrequency (%)
e707
 
10.4%
a529
 
7.8%
n500
 
7.3%
r454
 
6.7%
o441
 
6.5%
l410
 
6.0%
i406
 
6.0%
t361
 
5.3%
s316
 
4.6%
d207
 
3.0%
Other values (44)2492
36.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter5783
84.8%
Uppercase Letter1014
 
14.9%
Space Separator15
 
0.2%
Other Punctuation10
 
0.1%
Dash Punctuation1
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e707
12.2%
a529
 
9.1%
n500
 
8.6%
r454
 
7.9%
o441
 
7.6%
l410
 
7.1%
i406
 
7.0%
t361
 
6.2%
s316
 
5.5%
d207
 
3.6%
Other values (16)1452
25.1%
Uppercase Letter
ValueCountFrequency (%)
B120
 
11.8%
S94
 
9.3%
C93
 
9.2%
M79
 
7.8%
D67
 
6.6%
P58
 
5.7%
H56
 
5.5%
A51
 
5.0%
G48
 
4.7%
R43
 
4.2%
Other values (15)305
30.1%
Space Separator
ValueCountFrequency (%)
15
100.0%
Other Punctuation
ValueCountFrequency (%)
'10
100.0%
Dash Punctuation
ValueCountFrequency (%)
-1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin6797
99.6%
Common26
 
0.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
e707
 
10.4%
a529
 
7.8%
n500
 
7.4%
r454
 
6.7%
o441
 
6.5%
l410
 
6.0%
i406
 
6.0%
t361
 
5.3%
s316
 
4.6%
d207
 
3.0%
Other values (41)2466
36.3%
Common
ValueCountFrequency (%)
15
57.7%
'10
38.5%
-1
 
3.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII6823
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e707
 
10.4%
a529
 
7.8%
n500
 
7.3%
r454
 
6.7%
o441
 
6.5%
l410
 
6.0%
i406
 
6.0%
t361
 
5.3%
s316
 
4.6%
d207
 
3.0%
Other values (44)2492
36.5%

gender
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct3
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size7.9 KiB
Female
513 
Male
470 
U
 
17

Length

Max length6
Median length6
Mean length4.975
Min length1

Characters and Unicode

Total characters4975
Distinct characters7
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMale
2nd rowMale
3rd rowFemale
4th rowFemale
5th rowFemale

Common Values

ValueCountFrequency (%)
Female513
51.3%
Male470
47.0%
U17
 
1.7%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
female513
51.3%
male470
47.0%
u17
 
1.7%

Most occurring characters

ValueCountFrequency (%)
e1496
30.1%
a983
19.8%
l983
19.8%
F513
 
10.3%
m513
 
10.3%
M470
 
9.4%
U17
 
0.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter3975
79.9%
Uppercase Letter1000
 
20.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e1496
37.6%
a983
24.7%
l983
24.7%
m513
 
12.9%
Uppercase Letter
ValueCountFrequency (%)
F513
51.3%
M470
47.0%
U17
 
1.7%

Most occurring scripts

ValueCountFrequency (%)
Latin4975
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e1496
30.1%
a983
19.8%
l983
19.8%
F513
 
10.3%
m513
 
10.3%
M470
 
9.4%
U17
 
0.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII4975
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e1496
30.1%
a983
19.8%
l983
19.8%
F513
 
10.3%
m513
 
10.3%
M470
 
9.4%
U17
 
0.3%
Distinct100
Distinct (%)10.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean49.836
Minimum0
Maximum99
Zeros9
Zeros (%)0.9%
Negative0
Negative (%)0.0%
Memory size7.9 KiB

Quantile statistics

Minimum0
5-th percentile5
Q126.75
median51
Q372
95-th percentile94
Maximum99
Range99
Interquartile range (IQR)45.25

Descriptive statistics

Standard deviation27.79668613
Coefficient of variation (CV)0.5577631858
Kurtosis-1.088048884
Mean49.836
Median Absolute Deviation (MAD)22.5
Skewness-0.06562186172
Sum49836
Variance772.6557598
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
6020
 
2.0%
5918
 
1.8%
7017
 
1.7%
4217
 
1.7%
3716
 
1.6%
1116
 
1.6%
4715
 
1.5%
8414
 
1.4%
6214
 
1.4%
5714
 
1.4%
Other values (90)839
83.9%
ValueCountFrequency (%)
09
0.9%
18
0.8%
29
0.9%
39
0.9%
410
1.0%
513
1.3%
610
1.0%
713
1.3%
87
0.7%
95
 
0.5%
ValueCountFrequency (%)
999
0.9%
986
0.6%
9711
1.1%
969
0.9%
958
0.8%
9412
1.2%
939
0.9%
925
0.5%
918
0.8%
906
0.6%

DOB
Date

MISSING

Distinct958
Distinct (%)97.5%
Missing17
Missing (%)1.7%
Memory size7.9 KiB
Minimum1938-06-08 00:00:00
Maximum2002-02-27 00:00:00
Histogram with fixed size bins (bins=50)

job_title
Categorical

HIGH CARDINALITY
MISSING

Distinct184
Distinct (%)20.6%
Missing106
Missing (%)10.6%
Memory size7.9 KiB
Associate Professor
 
15
Environmental Tech
 
14
Software Consultant
 
14
Chief Design Engineer
 
13
Assistant Manager
 
12
Other values (179)
826 

Length

Max length36
Median length18
Mean length18.08836689
Min length5

Characters and Unicode

Total characters16171
Distinct characters47
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique45 ?
Unique (%)5.0%

Sample

1st rowGeneral Manager
2nd rowStructural Engineer
3rd rowSenior Cost Accountant
4th rowAccount Representative III
5th rowFinancial Analyst

Common Values

ValueCountFrequency (%)
Associate Professor15
 
1.5%
Environmental Tech14
 
1.4%
Software Consultant14
 
1.4%
Chief Design Engineer13
 
1.3%
Assistant Manager12
 
1.2%
VP Sales12
 
1.2%
Assistant Media Planner12
 
1.2%
Senior Sales Associate12
 
1.2%
Cost Accountant12
 
1.2%
Payment Adjustment Coordinator11
 
1.1%
Other values (174)767
76.7%
(Missing)106
 
10.6%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
engineer131
 
6.3%
assistant82
 
3.9%
manager76
 
3.7%
analyst66
 
3.2%
iv52
 
2.5%
iii50
 
2.4%
vp46
 
2.2%
ii44
 
2.1%
senior44
 
2.1%
sales44
 
2.1%
Other values (117)1444
69.5%

Most occurring characters

ValueCountFrequency (%)
e1578
 
9.8%
n1279
 
7.9%
a1253
 
7.7%
t1218
 
7.5%
1185
 
7.3%
i1124
 
7.0%
r1083
 
6.7%
s966
 
6.0%
o773
 
4.8%
c678
 
4.2%
Other values (37)5034
31.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter12650
78.2%
Uppercase Letter2328
 
14.4%
Space Separator1185
 
7.3%
Other Punctuation8
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e1578
12.5%
n1279
10.1%
a1253
9.9%
t1218
9.6%
i1124
8.9%
r1083
8.6%
s966
7.6%
o773
 
6.1%
c678
 
5.4%
l532
 
4.2%
Other values (14)2166
17.1%
Uppercase Letter
ValueCountFrequency (%)
I351
15.1%
A346
14.9%
S271
11.6%
E201
8.6%
P197
8.5%
C156
6.7%
M134
 
5.8%
D121
 
5.2%
V98
 
4.2%
T78
 
3.4%
Other values (11)375
16.1%
Space Separator
ValueCountFrequency (%)
1185
100.0%
Other Punctuation
ValueCountFrequency (%)
/8
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin14978
92.6%
Common1193
 
7.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
e1578
 
10.5%
n1279
 
8.5%
a1253
 
8.4%
t1218
 
8.1%
i1124
 
7.5%
r1083
 
7.2%
s966
 
6.4%
o773
 
5.2%
c678
 
4.5%
l532
 
3.6%
Other values (35)4494
30.0%
Common
ValueCountFrequency (%)
1185
99.3%
/8
 
0.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII16171
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e1578
 
9.8%
n1279
 
7.9%
a1253
 
7.7%
t1218
 
7.5%
1185
 
7.3%
i1124
 
7.0%
r1083
 
6.7%
s966
 
6.0%
o773
 
4.8%
c678
 
4.2%
Other values (37)5034
31.1%

job_industry_category
Categorical

HIGH CORRELATION
HIGH CORRELATION
MISSING

Distinct9
Distinct (%)1.1%
Missing165
Missing (%)16.5%
Memory size7.9 KiB
Financial Services
203 
Manufacturing
199 
Health
152 
Retail
78 
Property
64 
Other values (4)
139 

Length

Max length18
Median length13
Mean length11.31976048
Min length2

Characters and Unicode

Total characters9452
Distinct characters29
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowManufacturing
2nd rowProperty
3rd rowFinancial Services
4th rowManufacturing
5th rowFinancial Services

Common Values

ValueCountFrequency (%)
Financial Services203
20.3%
Manufacturing199
19.9%
Health152
15.2%
Retail78
 
7.8%
Property64
 
6.4%
IT51
 
5.1%
Entertainment37
 
3.7%
Argiculture26
 
2.6%
Telecommunications25
 
2.5%
(Missing)165
16.5%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
financial203
19.6%
services203
19.6%
manufacturing199
19.2%
health152
14.6%
retail78
 
7.5%
property64
 
6.2%
it51
 
4.9%
entertainment37
 
3.6%
argiculture26
 
2.5%
telecommunications25
 
2.4%

Most occurring characters

ValueCountFrequency (%)
a1096
11.6%
i999
10.6%
n965
10.2%
e850
 
9.0%
c681
 
7.2%
t655
 
6.9%
r619
 
6.5%
l484
 
5.1%
u475
 
5.0%
s228
 
2.4%
Other values (19)2400
25.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter8160
86.3%
Uppercase Letter1089
 
11.5%
Space Separator203
 
2.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a1096
13.4%
i999
12.2%
n965
11.8%
e850
10.4%
c681
8.3%
t655
8.0%
r619
7.6%
l484
5.9%
u475
5.8%
s228
 
2.8%
Other values (8)1108
13.6%
Uppercase Letter
ValueCountFrequency (%)
F203
18.6%
S203
18.6%
M199
18.3%
H152
14.0%
R78
 
7.2%
T76
 
7.0%
P64
 
5.9%
I51
 
4.7%
E37
 
3.4%
A26
 
2.4%
Space Separator
ValueCountFrequency (%)
203
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin9249
97.9%
Common203
 
2.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
a1096
11.8%
i999
10.8%
n965
10.4%
e850
 
9.2%
c681
 
7.4%
t655
 
7.1%
r619
 
6.7%
l484
 
5.2%
u475
 
5.1%
s228
 
2.5%
Other values (18)2197
23.8%
Common
ValueCountFrequency (%)
203
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII9452
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a1096
11.6%
i999
10.6%
n965
10.2%
e850
 
9.0%
c681
 
7.2%
t655
 
6.9%
r619
 
6.5%
l484
 
5.1%
u475
 
5.0%
s228
 
2.4%
Other values (19)2400
25.4%

wealth_segment
Categorical

HIGH CORRELATION

Distinct3
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size7.9 KiB
Mass Customer
508 
High Net Worth
251 
Affluent Customer
241 

Length

Max length17
Median length13
Mean length14.215
Min length13

Characters and Unicode

Total characters14215
Distinct characters21
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMass Customer
2nd rowMass Customer
3rd rowAffluent Customer
4th rowAffluent Customer
5th rowAffluent Customer

Common Values

ValueCountFrequency (%)
Mass Customer508
50.8%
High Net Worth251
25.1%
Affluent Customer241
24.1%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
customer749
33.3%
mass508
22.6%
net251
 
11.2%
worth251
 
11.2%
high251
 
11.2%
affluent241
 
10.7%

Most occurring characters

ValueCountFrequency (%)
s1765
12.4%
t1492
10.5%
1251
 
8.8%
e1241
 
8.7%
o1000
 
7.0%
r1000
 
7.0%
u990
 
7.0%
C749
 
5.3%
m749
 
5.3%
M508
 
3.6%
Other values (11)3470
24.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter10713
75.4%
Uppercase Letter2251
 
15.8%
Space Separator1251
 
8.8%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
s1765
16.5%
t1492
13.9%
e1241
11.6%
o1000
9.3%
r1000
9.3%
u990
9.2%
m749
7.0%
a508
 
4.7%
h502
 
4.7%
f482
 
4.5%
Other values (4)984
9.2%
Uppercase Letter
ValueCountFrequency (%)
C749
33.3%
M508
22.6%
H251
 
11.2%
N251
 
11.2%
W251
 
11.2%
A241
 
10.7%
Space Separator
ValueCountFrequency (%)
1251
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin12964
91.2%
Common1251
 
8.8%

Most frequent character per script

Latin
ValueCountFrequency (%)
s1765
13.6%
t1492
11.5%
e1241
9.6%
o1000
 
7.7%
r1000
 
7.7%
u990
 
7.6%
C749
 
5.8%
m749
 
5.8%
M508
 
3.9%
a508
 
3.9%
Other values (10)2962
22.8%
Common
ValueCountFrequency (%)
1251
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII14215
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
s1765
12.4%
t1492
10.5%
1251
 
8.8%
e1241
 
8.7%
o1000
 
7.0%
r1000
 
7.0%
u990
 
7.0%
C749
 
5.3%
m749
 
5.3%
M508
 
3.6%
Other values (11)3470
24.4%

deceased_indicator
Boolean

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size1.1 KiB
False
1000 
ValueCountFrequency (%)
False1000
100.0%

owns_car
Boolean

HIGH CORRELATION
HIGH CORRELATION

Distinct2
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Memory size1.1 KiB
False
507 
True
493 
ValueCountFrequency (%)
False507
50.7%
True493
49.3%

tenure
Real number (ℝ≥0)

Distinct23
Distinct (%)2.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean11.388
Minimum0
Maximum22
Zeros2
Zeros (%)0.2%
Negative0
Negative (%)0.0%
Memory size7.9 KiB

Quantile statistics

Minimum0
5-th percentile3
Q17
median11
Q315
95-th percentile20
Maximum22
Range22
Interquartile range (IQR)8

Descriptive statistics

Standard deviation5.037144908
Coefficient of variation (CV)0.442320417
Kurtosis-0.8128152156
Mean11.388
Median Absolute Deviation (MAD)4
Skewness0.07089079797
Sum11388
Variance25.37282883
MonotonicityNot monotonic
Histogram with fixed size bins (bins=23)
ValueCountFrequency (%)
979
 
7.9%
1374
 
7.4%
1168
 
6.8%
1063
 
6.3%
1261
 
6.1%
760
 
6.0%
560
 
6.0%
1759
 
5.9%
1558
 
5.8%
855
 
5.5%
Other values (13)363
36.3%
ValueCountFrequency (%)
02
 
0.2%
18
 
0.8%
215
 
1.5%
326
 
2.6%
436
3.6%
560
6.0%
645
4.5%
760
6.0%
855
5.5%
979
7.9%
ValueCountFrequency (%)
2212
 
1.2%
2124
 
2.4%
2022
 
2.2%
1934
3.4%
1836
3.6%
1759
5.9%
1649
4.9%
1558
5.8%
1454
5.4%
1374
7.4%

address
Categorical

HIGH CARDINALITY
UNIFORM
UNIQUE

Distinct1000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size7.9 KiB
309 Maple Wood Pass
 
1
5880 Hauk Street
 
1
602 Meadow Vale Lane
 
1
9 Killdeer Circle
 
1
7 Brentwood Circle
 
1
Other values (995)
995 

Length

Max length26
Median length18
Mean length17.582
Min length9

Characters and Unicode

Total characters17582
Distinct characters60
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1000 ?
Unique (%)100.0%

Sample

1st row45 Shopko Center
2nd row14 Mccormick Park
3rd row5 Colorado Crossing
4th row207 Annamark Plaza
5th row115 Montana Place

Common Values

ValueCountFrequency (%)
309 Maple Wood Pass1
 
0.1%
5880 Hauk Street1
 
0.1%
602 Meadow Vale Lane1
 
0.1%
9 Killdeer Circle1
 
0.1%
7 Brentwood Circle1
 
0.1%
6115 Forest Crossing1
 
0.1%
05475 Elgar Place1
 
0.1%
33 Pond Point1
 
0.1%
2 Main Lane1
 
0.1%
7 Messerschmidt Crossing1
 
0.1%
Other values (990)990
99.0%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
park59
 
1.9%
crossing59
 
1.9%
center58
 
1.9%
avenue55
 
1.8%
street55
 
1.8%
lane55
 
1.8%
point54
 
1.7%
hill51
 
1.6%
plaza50
 
1.6%
court49
 
1.6%
Other values (1137)2563
82.5%

Most occurring characters

ValueCountFrequency (%)
2108
 
12.0%
e1394
 
7.9%
a1152
 
6.6%
r1072
 
6.1%
n884
 
5.0%
o787
 
4.5%
i748
 
4.3%
l732
 
4.2%
t645
 
3.7%
s469
 
2.7%
Other values (50)7591
43.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter10401
59.2%
Decimal Number2975
 
16.9%
Space Separator2108
 
12.0%
Uppercase Letter2098
 
11.9%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e1394
13.4%
a1152
11.1%
r1072
10.3%
n884
8.5%
o787
 
7.6%
i748
 
7.2%
l732
 
7.0%
t645
 
6.2%
s469
 
4.5%
d318
 
3.1%
Other values (16)2200
21.2%
Uppercase Letter
ValueCountFrequency (%)
P337
16.1%
C272
13.0%
S167
 
8.0%
M142
 
6.8%
A142
 
6.8%
T114
 
5.4%
R113
 
5.4%
D112
 
5.3%
L108
 
5.1%
H99
 
4.7%
Other values (13)492
23.5%
Decimal Number
ValueCountFrequency (%)
6330
11.1%
7317
10.7%
0315
10.6%
2305
10.3%
3300
10.1%
1294
9.9%
5293
9.8%
9279
9.4%
8273
9.2%
4269
9.0%
Space Separator
ValueCountFrequency (%)
2108
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin12499
71.1%
Common5083
28.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
e1394
 
11.2%
a1152
 
9.2%
r1072
 
8.6%
n884
 
7.1%
o787
 
6.3%
i748
 
6.0%
l732
 
5.9%
t645
 
5.2%
s469
 
3.8%
P337
 
2.7%
Other values (39)4279
34.2%
Common
ValueCountFrequency (%)
2108
41.5%
6330
 
6.5%
7317
 
6.2%
0315
 
6.2%
2305
 
6.0%
3300
 
5.9%
1294
 
5.8%
5293
 
5.8%
9279
 
5.5%
8273
 
5.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII17582
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2108
 
12.0%
e1394
 
7.9%
a1152
 
6.6%
r1072
 
6.1%
n884
 
5.0%
o787
 
4.5%
i748
 
4.3%
l732
 
4.2%
t645
 
3.7%
s469
 
2.7%
Other values (50)7591
43.2%

postcode
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION

Distinct522
Distinct (%)52.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3019.227
Minimum2000
Maximum4879
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size7.9 KiB

Quantile statistics

Minimum2000
5-th percentile2046
Q12209
median2800
Q33845.5
95-th percentile4508.05
Maximum4879
Range2879
Interquartile range (IQR)1636.5

Descriptive statistics

Standard deviation848.8957672
Coefficient of variation (CV)0.2811632803
Kurtosis-1.142498217
Mean3019.227
Median Absolute Deviation (MAD)635.5
Skewness0.4921079268
Sum3019227
Variance720624.0235
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
22329
 
0.9%
21459
 
0.9%
21687
 
0.7%
27507
 
0.7%
30297
 
0.7%
42077
 
0.7%
21487
 
0.7%
39777
 
0.7%
20666
 
0.6%
43506
 
0.6%
Other values (512)928
92.8%
ValueCountFrequency (%)
20001
 
0.1%
20073
0.3%
20092
0.2%
20104
0.4%
20114
0.4%
20151
 
0.1%
20162
0.2%
20171
 
0.1%
20193
0.3%
20221
 
0.1%
ValueCountFrequency (%)
48791
 
0.1%
48521
 
0.1%
48182
0.2%
48172
0.2%
48143
0.3%
47441
 
0.1%
47402
0.2%
47201
 
0.1%
47171
 
0.1%
47101
 
0.1%

state
Categorical

HIGH CORRELATION
HIGH CORRELATION

Distinct3
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size7.9 KiB
NSW
506 
VIC
266 
QLD
228 

Length

Max length3
Median length3
Mean length3
Min length3

Characters and Unicode

Total characters3000
Distinct characters9
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowQLD
2nd rowNSW
3rd rowVIC
4th rowQLD
5th rowNSW

Common Values

ValueCountFrequency (%)
NSW506
50.6%
VIC266
26.6%
QLD228
22.8%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
nsw506
50.6%
vic266
26.6%
qld228
22.8%

Most occurring characters

ValueCountFrequency (%)
N506
16.9%
S506
16.9%
W506
16.9%
V266
8.9%
I266
8.9%
C266
8.9%
Q228
7.6%
L228
7.6%
D228
7.6%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter3000
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
N506
16.9%
S506
16.9%
W506
16.9%
V266
8.9%
I266
8.9%
C266
8.9%
Q228
7.6%
L228
7.6%
D228
7.6%

Most occurring scripts

ValueCountFrequency (%)
Latin3000
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
N506
16.9%
S506
16.9%
W506
16.9%
V266
8.9%
I266
8.9%
C266
8.9%
Q228
7.6%
L228
7.6%
D228
7.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII3000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
N506
16.9%
S506
16.9%
W506
16.9%
V266
8.9%
I266
8.9%
C266
8.9%
Q228
7.6%
L228
7.6%
D228
7.6%

country
Categorical

CONSTANT
HIGH CORRELATION
REJECTED

Distinct1
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size7.9 KiB
Australia
1000 

Length

Max length9
Median length9
Mean length9
Min length9

Characters and Unicode

Total characters9000
Distinct characters8
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowAustralia
2nd rowAustralia
3rd rowAustralia
4th rowAustralia
5th rowAustralia

Common Values

ValueCountFrequency (%)
Australia1000
100.0%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
australia1000
100.0%

Most occurring characters

ValueCountFrequency (%)
a2000
22.2%
A1000
11.1%
u1000
11.1%
s1000
11.1%
t1000
11.1%
r1000
11.1%
l1000
11.1%
i1000
11.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter8000
88.9%
Uppercase Letter1000
 
11.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a2000
25.0%
u1000
12.5%
s1000
12.5%
t1000
12.5%
r1000
12.5%
l1000
12.5%
i1000
12.5%
Uppercase Letter
ValueCountFrequency (%)
A1000
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin9000
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
a2000
22.2%
A1000
11.1%
u1000
11.1%
s1000
11.1%
t1000
11.1%
r1000
11.1%
l1000
11.1%
i1000
11.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII9000
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a2000
22.2%
A1000
11.1%
u1000
11.1%
s1000
11.1%
t1000
11.1%
r1000
11.1%
l1000
11.1%
i1000
11.1%

property_valuation
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION

Distinct12
Distinct (%)1.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean7.397
Minimum1
Maximum12
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size7.9 KiB

Quantile statistics

Minimum1
5-th percentile2
Q16
median8
Q39
95-th percentile11
Maximum12
Range11
Interquartile range (IQR)3

Descriptive statistics

Standard deviation2.758804452
Coefficient of variation (CV)0.3729626134
Kurtosis-0.3712799928
Mean7.397
Median Absolute Deviation (MAD)2
Skewness-0.5576112079
Sum7397
Variance7.611002002
MonotonicityNot monotonic
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
9173
17.3%
8162
16.2%
7138
13.8%
10116
11.6%
670
7.0%
1162
 
6.2%
557
 
5.7%
453
 
5.3%
351
 
5.1%
1246
 
4.6%
Other values (2)72
7.2%
ValueCountFrequency (%)
130
 
3.0%
242
 
4.2%
351
 
5.1%
453
 
5.3%
557
 
5.7%
670
7.0%
7138
13.8%
8162
16.2%
9173
17.3%
10116
11.6%
ValueCountFrequency (%)
1246
 
4.6%
1162
 
6.2%
10116
11.6%
9173
17.3%
8162
16.2%
7138
13.8%
670
7.0%
557
 
5.7%
453
 
5.3%
351
 
5.1%

Unnamed: 16
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct71
Distinct (%)7.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.74734
Minimum0.4
Maximum1.1
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size7.9 KiB

Quantile statistics

Minimum0.4
5-th percentile0.43
Q10.57
median0.75
Q30.92
95-th percentile1.07
Maximum1.1
Range0.7
Interquartile range (IQR)0.35

Descriptive statistics

Standard deviation0.2050823815
Coefficient of variation (CV)0.2744164389
Kurtosis-1.194441209
Mean0.74734
Median Absolute Deviation (MAD)0.17
Skewness0.04085608504
Sum747.34
Variance0.04205878318
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.6226
 
2.6%
0.8624
 
2.4%
0.5820
 
2.0%
0.4920
 
2.0%
1.0820
 
2.0%
0.7720
 
2.0%
0.8419
 
1.9%
1.0719
 
1.9%
0.5619
 
1.9%
0.5718
 
1.8%
Other values (61)795
79.5%
ValueCountFrequency (%)
0.415
1.5%
0.4116
1.6%
0.4210
1.0%
0.4311
1.1%
0.4417
1.7%
0.459
0.9%
0.4614
1.4%
0.4715
1.5%
0.4817
1.7%
0.4920
2.0%
ValueCountFrequency (%)
1.113
1.3%
1.0914
1.4%
1.0820
2.0%
1.0719
1.9%
1.0611
1.1%
1.0517
1.7%
1.0414
1.4%
1.0311
1.1%
1.0212
1.2%
1.0113
1.3%

Unnamed: 17
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct132
Distinct (%)13.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.839005
Minimum0.4
Maximum1.375
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size7.9 KiB

Quantile statistics

Minimum0.4
5-th percentile0.47
Q10.6375
median0.82
Q31.031875
95-th percentile1.3
Maximum1.375
Range0.975
Interquartile range (IQR)0.394375

Descriptive statistics

Standard deviation0.2488584497
Coefficient of variation (CV)0.2966114025
Kurtosis-0.8040909217
Mean0.839005
Median Absolute Deviation (MAD)0.2
Skewness0.2688519805
Sum839.005
Variance0.061930528
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1.0520
 
2.0%
0.5519
 
1.9%
0.77515
 
1.5%
0.9515
 
1.5%
0.514
 
1.4%
1.0714
 
1.4%
0.614
 
1.4%
0.5813
 
1.3%
0.713
 
1.3%
0.8613
 
1.3%
Other values (122)850
85.0%
ValueCountFrequency (%)
0.46
0.6%
0.4111
1.1%
0.424
 
0.4%
0.436
0.6%
0.446
0.6%
0.454
 
0.4%
0.467
0.7%
0.477
0.7%
0.4811
1.1%
0.4911
1.1%
ValueCountFrequency (%)
1.3757
0.7%
1.36259
0.9%
1.3512
1.2%
1.33755
0.5%
1.3255
0.5%
1.31258
0.8%
1.37
0.7%
1.28756
0.6%
1.2755
0.5%
1.26258
0.8%

Unnamed: 18
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct183
Distinct (%)18.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.9426725
Minimum0.4
Maximum1.71875
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size7.9 KiB

Quantile statistics

Minimum0.4
5-th percentile0.5125
Q10.7125
median0.9125
Q31.14296875
95-th percentile1.453125
Maximum1.71875
Range1.31875
Interquartile range (IQR)0.43046875

Descriptive statistics

Standard deviation0.2948324662
Coefficient of variation (CV)0.3127623498
Kurtosis-0.4868813031
Mean0.9426725
Median Absolute Deviation (MAD)0.2125
Skewness0.4057067456
Sum942.6725
Variance0.08692618315
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.7518
 
1.8%
1.0515
 
1.5%
0.962515
 
1.5%
1.3514
 
1.4%
1.312514
 
1.4%
1.187514
 
1.4%
0.812513
 
1.3%
0.5513
 
1.3%
1.137513
 
1.3%
0.72513
 
1.3%
Other values (173)858
85.8%
ValueCountFrequency (%)
0.44
0.4%
0.416
0.6%
0.421
 
0.1%
0.434
0.4%
0.443
0.3%
0.451
 
0.1%
0.463
0.3%
0.473
0.3%
0.484
0.4%
0.497
0.7%
ValueCountFrequency (%)
1.718754
0.4%
1.7031253
0.3%
1.68754
0.4%
1.6718752
 
0.2%
1.656251
 
0.1%
1.6406253
0.3%
1.6253
0.3%
1.6093755
0.5%
1.593753
0.3%
1.5781254
0.4%

Unnamed: 19
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct321
Distinct (%)32.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.87051425
Minimum0.34
Maximum1.71875
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size7.9 KiB

Quantile statistics

Minimum0.34
5-th percentile0.4675
Q10.65875
median0.842625
Q31.0625
95-th percentile1.36796875
Maximum1.71875
Range1.37875
Interquartile range (IQR)0.40375

Descriptive statistics

Standard deviation0.2808905394
Coefficient of variation (CV)0.3226719601
Kurtosis-0.3139375306
Mean0.87051425
Median Absolute Deviation (MAD)0.205125
Skewness0.4723827623
Sum870.51425
Variance0.0788994951
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1.062514
 
1.4%
0.7511
 
1.1%
0.637510
 
1.0%
1.059
 
0.9%
0.81259
 
0.9%
1.0093759
 
0.9%
0.689
 
0.9%
1.3759
 
0.9%
1.13759
 
0.9%
1.31259
 
0.9%
Other values (311)902
90.2%
ValueCountFrequency (%)
0.341
 
0.1%
0.34854
0.4%
0.3571
 
0.1%
0.3741
 
0.1%
0.38251
 
0.1%
0.3911
 
0.1%
0.39952
0.2%
0.43
0.3%
0.4082
0.2%
0.412
0.2%
ValueCountFrequency (%)
1.718751
0.1%
1.7031251
0.1%
1.68752
0.2%
1.6718751
0.1%
1.6406252
0.2%
1.6252
0.2%
1.6093751
0.1%
1.593752
0.2%
1.5781252
0.2%
1.56251
0.1%

Unnamed: 20
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct324
Distinct (%)32.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean498.819
Minimum1
Maximum1000
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size7.9 KiB

Quantile statistics

Minimum1
5-th percentile50
Q1250
median500
Q3750.25
95-th percentile948.15
Maximum1000
Range999
Interquartile range (IQR)500.25

Descriptive statistics

Standard deviation288.8109971
Coefficient of variation (CV)0.5789895675
Kurtosis-1.200749808
Mean498.819
Median Absolute Deviation (MAD)250
Skewness0.001245859611
Sum498819
Variance83411.79203
MonotonicityIncreasing
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
76013
 
1.3%
25912
 
1.2%
4559
 
0.9%
9049
 
0.9%
3869
 
0.9%
1339
 
0.9%
7008
 
0.8%
8208
 
0.8%
3128
 
0.8%
5368
 
0.8%
Other values (314)907
90.7%
ValueCountFrequency (%)
13
0.3%
42
0.2%
62
0.2%
82
0.2%
102
0.2%
121
 
0.1%
131
 
0.1%
142
0.2%
161
 
0.1%
172
0.2%
ValueCountFrequency (%)
10001
 
0.1%
9973
0.3%
9961
 
0.1%
9942
 
0.2%
9931
 
0.1%
9885
0.5%
9871
 
0.1%
9852
 
0.2%
9832
 
0.2%
9794
0.4%

Rank
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct324
Distinct (%)32.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean498.819
Minimum1
Maximum1000
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size7.9 KiB

Quantile statistics

Minimum1
5-th percentile50
Q1250
median500
Q3750.25
95-th percentile948.15
Maximum1000
Range999
Interquartile range (IQR)500.25

Descriptive statistics

Standard deviation288.8109971
Coefficient of variation (CV)0.5789895675
Kurtosis-1.200749808
Mean498.819
Median Absolute Deviation (MAD)250
Skewness0.001245859611
Sum498819
Variance83411.79203
MonotonicityIncreasing
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
76013
 
1.3%
25912
 
1.2%
4559
 
0.9%
9049
 
0.9%
3869
 
0.9%
1339
 
0.9%
7008
 
0.8%
8208
 
0.8%
3128
 
0.8%
5368
 
0.8%
Other values (314)907
90.7%
ValueCountFrequency (%)
13
0.3%
42
0.2%
62
0.2%
82
0.2%
102
0.2%
121
 
0.1%
131
 
0.1%
142
0.2%
161
 
0.1%
172
0.2%
ValueCountFrequency (%)
10001
 
0.1%
9973
0.3%
9961
 
0.1%
9942
 
0.2%
9931
 
0.1%
9885
0.5%
9871
 
0.1%
9852
 
0.2%
9832
 
0.2%
9794
0.4%

Value
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct324
Distinct (%)32.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.8817140937
Minimum0.34
Maximum1.71875
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size7.9 KiB

Quantile statistics

Minimum0.34
5-th percentile0.45655625
Q10.64953125
median0.86
Q31.075
95-th percentile1.40625
Maximum1.71875
Range1.37875
Interquartile range (IQR)0.42546875

Descriptive statistics

Standard deviation0.293524508
Coefficient of variation (CV)0.3329021392
Kurtosis-0.4524719248
Mean0.8817140937
Median Absolute Deviation (MAD)0.213125
Skewness0.4299025249
Sum881.7140937
Variance0.08615663677
MonotonicityDecreasing
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.637513
 
1.3%
1.062512
 
1.2%
1.23759
 
0.9%
0.89259
 
0.9%
0.9456259
 
0.9%
0.59
 
0.9%
0.8258
 
0.8%
1.028
 
0.8%
0.68758
 
0.8%
0.5843758
 
0.8%
Other values (314)907
90.7%
ValueCountFrequency (%)
0.341
 
0.1%
0.3573
0.3%
0.3741
 
0.1%
0.38252
 
0.2%
0.3911
 
0.1%
0.39955
0.5%
0.41
 
0.1%
0.4082
 
0.2%
0.412
 
0.2%
0.41654
0.4%
ValueCountFrequency (%)
1.718753
0.3%
1.7031252
0.2%
1.6718752
0.2%
1.656252
0.2%
1.6406252
0.2%
1.6251
 
0.1%
1.6093751
 
0.1%
1.593752
0.2%
1.56251
 
0.1%
1.5468752
0.2%

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

first_namelast_namegenderpast_3_years_bike_related_purchasesDOBjob_titlejob_industry_categorywealth_segmentdeceased_indicatorowns_cartenureaddresspostcodestatecountryproperty_valuationUnnamed: 16Unnamed: 17Unnamed: 18Unnamed: 19Unnamed: 20RankValue
0ChickieBristerMale861957-07-12General ManagerManufacturingMass CustomerNYes1445 Shopko Center4500QLDAustralia60.560.70000.8750000.743750111.718750
1MorlyGeneryMale691970-03-22Structural EngineerPropertyMass CustomerNNo1614 Mccormick Park2113NSWAustralia110.890.89001.1125000.945625111.718750
2ArdelisForresterFemale101974-08-28Senior Cost AccountantFinancial ServicesAffluent CustomerNNo105 Colorado Crossing3505VICAustralia51.011.01001.0100001.010000111.718750
3LucineStuttFemale641979-01-28Account Representative IIIManufacturingAffluent CustomerNYes5207 Annamark Plaza4814QLDAustralia10.871.08751.0875001.087500441.703125
4MelindaHadleeFemale341965-09-21Financial AnalystFinancial ServicesAffluent CustomerNNo19115 Montana Place2093NSWAustralia90.520.52000.6500000.650000441.703125
5DruciBrandliFemale391951-04-29Assistant Media PlannerEntertainmentHigh Net WorthNYes2289105 Pearson Terrace4075QLDAustralia70.430.53750.5375000.537500661.671875
6RutledgeHalltMale231976-10-06Compensation AnalystFinancial ServicesMass CustomerNNo87 Nevada Crossing2620NSWAustralia70.400.40000.4000000.340000661.671875
7NancieVianFemale741972-12-27Human Resources Assistant IIRetailMass CustomerNYes1085 Carioca Point4814QLDAustralia50.580.72500.7250000.616250881.656250
8DuffKarlowiczMale501972-04-28Speech PathologistManufacturingMass CustomerNYes5717 West Drive2200NSWAustralia101.031.28751.6093751.367969881.656250
9BarthelDocketMale721985-08-02Accounting Assistant IVITMass CustomerNYes1780 Scofield Junction4151QLDAustralia50.841.05001.0500000.89250010101.640625

Last rows

first_namelast_namegenderpast_3_years_bike_related_purchasesDOBjob_titlejob_industry_categorywealth_segmentdeceased_indicatorowns_cartenureaddresspostcodestatecountryproperty_valuationUnnamed: 16Unnamed: 17Unnamed: 18Unnamed: 19Unnamed: 20RankValue
990JermaineBagshaweFemale601954-05-14Help Desk OperatorPropertyMass CustomerNYes9260 Briar Crest Drive4209QLDAustralia60.510.63750.6375000.5418759889880.3995
991BryanJachtymMale591974-05-15Automation Specialist IManufacturingMass CustomerNYes1556 Moland Crossing3356VICAustralia30.690.86250.8625000.7331259889880.3995
992RenieLaundonFemale321973-12-18Assistant Media PlannerEntertainmentMass CustomerNYes81 Shelley Pass4118QLDAustralia31.081.35001.3500001.1475009939930.3910
993WeidarEtheridgeMale381959-07-13Compensation AnalystFinancial ServicesMass CustomerNYes60535 Jay Point2422NSWAustralia40.921.15001.1500000.9775009949940.3825
994DathaFishburnFemale151990-07-02Office Assistant IVRetailMass CustomerNNo36 Caliangt Way3079VICAustralia120.770.77000.9625000.8181259949940.3825
995FerdinandRomanettiMale601959-10-07ParalegalFinancial ServicesAffluent CustomerNNo92 Sloan Way2200NSWAustralia70.790.79000.7900000.7900009969960.3740
996BurkWortleyMale222001-10-17Senior Sales AssociateHealthMass CustomerNNo604 Union Crossing2196NSWAustralia100.760.76000.9500000.8075009979970.3570
997MelloneyTembyFemale171954-10-05Budget/Accounting Analyst IVFinancial ServicesAffluent CustomerNYes1533475 Fair Oaks Junction4702QLDAustralia20.851.06251.0625001.0625009979970.3570
998DickieCubbiniMale301952-12-17Financial AdvisorFinancial ServicesMass CustomerNYes1957666 Victoria Way4215QLDAustralia21.091.36251.3625001.1581259979970.3570
999SylasDuffillMale561955-10-02Staff Accountant IVPropertyMass CustomerNYes1421875 Grover Drive2010NSWAustralia90.470.58750.7343750.624219100010000.3400